Implement open ai api llm backend #3238

seun-ja · 2025-08-20T19:44:41Z

This PR aims to resolve #3211. Remote LLM implementation now provides options: the default option and OpenAI.

The default represents the current option, and as the name states, it's the default in case none is provided.

It's expandable while preserving the original API and not breaking it.

Concerns arise regarding wasi::llm parameters that are not compatible with OpenAI's API documentation, which may necessitate adjustments to the current definitions available in wit.. Documented here

[Update]

In reference to @rylev's comment there would be no need for a change in the wit file, just going to use what's available.

itowlson · 2025-08-21T00:17:56Z

It would be useful to see an example of how a user selects this back-end in their runtime config. It might also be useful to add some discussion to the PR of the tradeoffs you considered when bundling this in with the existing cloud-gpu remote back-end vs. having it as a separate (typed) back end.

seun-ja · 2025-08-26T08:58:37Z

The PR is going with bundling the open-ai api backend into the existing llm-remote-http crate because:

They both share lots of common resource, therefore, avoid unneccessary cyclic import issue
Interegrating it with existing won't break the current set-up
Keeps the current streamlined options, Spin (local) and RemoteHttp, while still making the later customizable and expandable depending on the optional field added to the runtime-config

@itowlson, I just realised in my PR main comment I said "breaking it", apologies, I meant "not breaking it" 😄

karthik2804 · 2025-08-28T10:15:03Z

crates/factor-llm/src/spin.rs

 pub struct RemoteHttpCompute {
    url: Url,
    auth_token: String,
+    custom_llm: Option<String>,


I think we want this to be an enum which would allow us to constrain it to only the options implemented.

karthik2804 · 2025-08-28T10:22:03Z

crates/llm-remote-http/src/open_ai/mod.rs

+
+        tracing::info!("Sending remote inference request to {chat_url}");
+
+        let body = CreateChatCompletionRequest {


I think we should use the generate API instead of Chat completions here because our interface does not really lend itself to having multiple roles. The wit interface aligns more closely with the generate API instead of chat completions.

I have looked into this option; the endpoint is /v1/responses. While it is more compatible with the current wit interface, I noticed an unusual behaviour with this endpoint while testing it.

For example, it didn't work with gpt-4. I got this error

"error": { "message": "The model `gpt-4` does not exist or you do not have access to it.", "type": "invalid_request_error", "param": null, "code": "model_not_found" }

But when I tried using it with gpt-4o, it worked. Got a different error message and rightly so 🫣

"error": { "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.", "type": "insufficient_quota", "param": null, "code": "insufficient_quota" }

I tried with Ollama using the gpt-oss:20b model, got this error:

404 page not found

Dived deep and realised Ollama doesn't have that endpoint. However, Ollama has the /generate endpoint, but not compatible with what we need here. Returns this:

{ "model": "gpt-oss:20b", "created_at": "2025-08-28T21:17:25.432385Z", "response": "", "done": true, "done_reason": "load" }

In conclusion, I think the current setup is fine, the User role matches the expected behaviour, similar to what the default remote LLM does.

karthik2804 · 2025-09-05T08:02:59Z

crates/factor-llm/src/spin.rs

    url: Url,
    auth_token: String,
+    #[serde(default)]
+    custom_llm: CustomLlm,


Probably calling it backend might make more sense than custom_llm as this refers to the backing API.

karthik2804 · 2025-09-05T08:05:39Z

crates/llm-remote-http/src/lib.rs

+
+#[derive(Debug, Default, serde::Deserialize, PartialEq)]
+#[serde(rename_all = "snake_case")]
+pub enum CustomLlm {


This probably needs a better name. Something similar to the backend or something along those lines.

dicej

LGTM. I don't have much LLM API experience, so I can't comment on the details of how this uses the OpenAI APIs, but the structure of the code seems reasonable to me.

Is there any way we can automate testing this without necessarily spending money on an OpenAI account for CI, and without writing a mock API server from scratch (which might not be very useful anyway)?

seun-ja · 2025-09-09T19:01:30Z

Thanks for the review @dicej.

There could be a way. Would look into that.

On the other hand, because the API is compatible with other similar LLM APIs, such as Ollama, and it's free.

itowlson · 2025-09-29T20:50:31Z

crates/llm-remote-http/src/lib.rs

+struct CreateChatCompletionResponse {
+    /// A unique identifier for the chat completion.
+    #[serde(rename = "id")]
+    _id: String,


Not sure why we're deserialising these presumably unused fields? I see you'd need them with deny_unknown_fields but that doesn't seem to be turned on...?

(This seems to be a general pattern throughout. If we need it then no worries, just looks odd to me.)

itowlson · 2025-09-29T20:54:36Z

crates/llm-remote-http/src/lib.rs

+}
+
+#[derive(Deserialize)]
+struct OpenAIEmbeddingUsage {


By the name, this seems specific to the OpenAI backend, which makes me wonder if CreateEmbeddingResponse is also OpenAI-specific. I'd expect OpenAI stuff to be in open_ai.rs but I'm not sure I'm understanding this right.

itowlson · 2025-09-29T20:59:26Z

crates/llm-remote-http/src/open_ai.rs

+            user: None,
+        };
+
+        let chat_url = self


chat_url seems a misleading name for this.

itowlson · 2025-09-29T21:00:09Z

crates/llm-remote-http/src/open_ai.rs

+
+        let chat_url = self
+            .url
+            .join("/v1/chat/completions")


This string appears in multiple places: should we pull it out to a const?

itowlson · 2025-09-29T21:00:30Z

crates/llm-remote-http/src/open_ai.rs

+
+        let chat_url = self
+            .url
+            .join("/v1/embeddings")


Again, consider creating a const

itowlson · 2025-09-29T21:02:35Z

crates/llm-remote-http/src/open_ai/mod.rs

+#[derive(serde::Deserialize)]
+#[serde(untagged)]
+enum CreateChatCompletionResponses {
+    Success(CreateChatCompletionResponse),


This reinforces my concern about the location of CreateChatCompletionResponse

itowlson · 2025-09-29T21:02:53Z

crates/llm-remote-http/src/open_ai.rs

+
+#[derive(serde::Deserialize)]
+#[serde(untagged)]
+enum CreateChatCompletionResponses {


Confused by this name. It appears to contain only one response?

itowlson · 2025-09-29T21:03:10Z

crates/llm-remote-http/src/open_ai.rs

+
+#[derive(serde::Deserialize)]
+#[serde(untagged)]
+enum CreateEmbeddingResponses {


Again puzzled by the plural

itowlson · 2025-09-29T21:07:27Z

crates/llm-remote-http/src/open_ai/schemas.rs

What is it the schema for? Is it OpenAI-specific? All these message types seem a bit randomly scattered across the files: could we clarify the rhyme and reason, e.g. so maintainers know where to put new types?

Yes, it is OpenAI-specific schemas (I'm probably using the word incorrectly).

I thought having it inside of open_ai.rs might just make the file a bit crowded.

What I'd suggest:

Move the file into a new open_ai folder

Add mod schemas; to the top of open_ai.rs

Change references in open_ai.rs from crate::schemas to just schemas

This will make the schemas module a sub-module of the open_ai module which is a nice pattern for private stuff which would be too clutterful in the main file of the module.

itowlson · 2025-09-29T22:35:53Z

crates/llm-remote-http/src/open_ai/mod.rs

@@ -0,0 +1,266 @@
+#![allow(dead_code)]


Where is the dead code? Why is it needed? If it is neede3d, could we limit this declaration to specific parts of the module rather than the whole thing (so that we don't lose checking on the whole module)?

itowlson · 2025-09-29T22:38:16Z

crates/llm-remote-http/src/open_ai/mod.rs

+}
+
+#[derive(Serialize, Debug)]
+struct CreateChatCompletionRequest {


I am still unclear on when I should put message types in this file vs when I should put them in schemas.rs.

itowlson · 2025-09-29T22:41:03Z

crates/llm-remote-http/src/open_ai/mod.rs

+#[derive(Deserialize)]
+struct CreateChatCompletionResponse {
+    /// A unique identifier for the chat completion.
+    id: String,


I am confused. Is this why you allowed dead code? If we can't just omit the unused fields, then your previous strategy was right; but I would have thought we could omit them e.g.

struct CCCR { choices: ..., usage: ..., }

But if that doesn't work then the _id prefix kludge is better than the dead code kludge.

Deserialization would fail if these fields are not provided, which is why I can't ignore them.

I'd revert to the underscore prefix

I don't understand why deserialisation would fail. By default, if there's no Rust field that matches a JSON field, the JSON field is ignored. You need to explicitly opt into the "fail on unknown JSON" behaviour, because you never know when the server will add a new field; and as far as I can tell you haven't done that. (You had me doubting myself but I tested it: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=14e024bd7b123e568dd368bc31dcbe95)

I feel like I'm missing something here - could you clarify the nature of the failure? Thanks!

You're absolutely right, apologies.

I was confusing it with some issues I encountered earlier in this PR.

itowlson · 2025-09-29T23:42:36Z

crates/llm-remote-http/src/open_ai/mod.rs

+
+        let url = self
+            .url
+            .join("EMBEDDINGS_ENDPOINT")


Suggested change

.join("EMBEDDINGS_ENDPOINT")

.join(EMBEDDINGS_ENDPOINT)

This raises some concerns about testing for me: I can't see how this could have worked.

I appreciate automated testing is hard for a service like this but can you confirm that you've tested the current code manually (both OpenAI and the changes to classic) and maybe capture what tests you performed?

Thanks for catching that.

For testing, I manually test using the example provided in this PR [./examples/open-ai-rust]. It only tests for inference, which I still used to test before the last commit.

For embedded, I also used it but modified it slightly to ensure it worked. Used Ollama locally (their API is modelled closely to OpenAI's).

itowlson

LGTM apart from that typo: once that's fixed and you've fully re-run your tests, this should be good to go.

itowlson · 2025-09-30T00:36:42Z

crates/llm-remote-http/src/open_ai/schemas.rs

+impl From<CreateChatCompletionResponse> for wasi_llm::InferencingResult {
+    fn from(value: CreateChatCompletionResponse) -> Self {
+        Self {
+            text: value.choices[0].message.content.clone(),


Do we need to defend against choices being empty? Or is it guaranteed to have an element?

While it isn't explicitly stated in the documents that there will be at least one, it is expected to always have at least one if it's successful.

I suppose it costs nothing to add a safeguard against such an event.

While preserving the default http client, it introduces an OpenAI client type which also supports APIs similar to OpenAI's specs. Also includes an example Signed-off-by: Aminu Oluwaseun Joshua <[email protected]>

karthik2804 · 2025-09-30T07:56:09Z

I am late here, but to answer

Is there any way we can automate testing this without necessarily spending money on an OpenAI account for CI, and without writing a mock API server from scratch (which might not be very useful anyway)?

We could technically run ollama with a small model to verify in CI integration tests but whether it is worth it, I am not sure.

seun-ja marked this pull request as ready for review August 25, 2025 20:41

karthik2804 reviewed Aug 28, 2025

View reviewed changes

vdice mentioned this pull request Aug 28, 2025

manifest: ai_model parsing too restrictive #3256

Closed

karthik2804 reviewed Sep 5, 2025

View reviewed changes

dicej approved these changes Sep 9, 2025

View reviewed changes

kmk142789 approved these changes Sep 20, 2025

View reviewed changes

seun-ja force-pushed the implement-open-ai-api-llm-backend branch from e61e83a to ccc7b34 Compare September 29, 2025 21:06

itowlson reviewed Sep 29, 2025

View reviewed changes

seun-ja force-pushed the implement-open-ai-api-llm-backend branch from ccc7b34 to 78faf6b Compare September 29, 2025 22:19

seun-ja requested a review from itowlson September 29, 2025 22:23

itowlson reviewed Sep 29, 2025

View reviewed changes

seun-ja force-pushed the implement-open-ai-api-llm-backend branch from 78faf6b to 6ce4fec Compare September 29, 2025 23:31

seun-ja requested a review from itowlson September 29, 2025 23:33

itowlson reviewed Sep 29, 2025

View reviewed changes

seun-ja force-pushed the implement-open-ai-api-llm-backend branch from 6ce4fec to b8f9973 Compare September 29, 2025 23:57

seun-ja requested a review from itowlson September 30, 2025 00:19

itowlson reviewed Sep 30, 2025

View reviewed changes

itowlson approved these changes Sep 30, 2025

View reviewed changes

Introduces custom support for OpenAI API specs.

4226ea7

While preserving the default http client, it introduces an OpenAI client type which also supports APIs similar to OpenAI's specs. Also includes an example Signed-off-by: Aminu Oluwaseun Joshua <[email protected]>

seun-ja force-pushed the implement-open-ai-api-llm-backend branch from b8f9973 to 4226ea7 Compare September 30, 2025 00:59

seun-ja requested a review from itowlson September 30, 2025 01:00

itowlson approved these changes Sep 30, 2025

View reviewed changes

itowlson merged commit 4394e4b into spinframework:main Sep 30, 2025
17 checks passed

seun-ja deleted the implement-open-ai-api-llm-backend branch September 30, 2025 05:14

rancherbot mentioned this pull request Oct 29, 2025

rddepman: bump spinCLI from 3.4.0 to 3.5.0 rancher-sandbox/rancher-desktop#9390

Closed

rancherbot mentioned this pull request Nov 12, 2025

rddepman: bump spinCLI from 3.4.0 to 3.5.1 rancher-sandbox/rancher-desktop#9447

Open


		tracing::info!("Sending remote inference request to {chat_url}");

		let body = CreateChatCompletionRequest {

Implement open ai api llm backend #3238

Implement open ai api llm backend #3238

Uh oh!

Conversation

seun-ja commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

itowlson commented Aug 21, 2025

Uh oh!

seun-ja commented Aug 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dicej left a comment

Choose a reason for hiding this comment

Uh oh!

seun-ja commented Sep 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

itowlson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

karthik2804 commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

seun-ja commented Aug 20, 2025 •

edited

Loading